<!DOCTYPE html>
<HTML lang = "en">
<HEAD>
  <meta charset="UTF-8"/>
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
  <title>PDEs, Convolutions, and the Mathematics of Locality</title>
  

  <script type="text/x-mathjax-config">
    MathJax.Hub.Config({
      tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]},
      TeX: { equationNumbers: { autoNumber: "AMS" } }
    });
  </script>

  <script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
  </script>

  
<style>
pre.hljl {
    border: 1px solid #ccc;
    margin: 5px;
    padding: 5px;
    overflow-x: auto;
    color: rgb(68,68,68); background-color: rgb(251,251,251); }
pre.hljl > span.hljl-t { }
pre.hljl > span.hljl-w { }
pre.hljl > span.hljl-e { }
pre.hljl > span.hljl-eB { }
pre.hljl > span.hljl-o { }
pre.hljl > span.hljl-k { color: rgb(148,91,176); font-weight: bold; }
pre.hljl > span.hljl-kc { color: rgb(59,151,46); font-style: italic; }
pre.hljl > span.hljl-kd { color: rgb(214,102,97); font-style: italic; }
pre.hljl > span.hljl-kn { color: rgb(148,91,176); font-weight: bold; }
pre.hljl > span.hljl-kp { color: rgb(148,91,176); font-weight: bold; }
pre.hljl > span.hljl-kr { color: rgb(148,91,176); font-weight: bold; }
pre.hljl > span.hljl-kt { color: rgb(148,91,176); font-weight: bold; }
pre.hljl > span.hljl-n { }
pre.hljl > span.hljl-na { }
pre.hljl > span.hljl-nb { }
pre.hljl > span.hljl-nbp { }
pre.hljl > span.hljl-nc { }
pre.hljl > span.hljl-ncB { }
pre.hljl > span.hljl-nd { color: rgb(214,102,97); }
pre.hljl > span.hljl-ne { }
pre.hljl > span.hljl-neB { }
pre.hljl > span.hljl-nf { color: rgb(66,102,213); }
pre.hljl > span.hljl-nfm { color: rgb(66,102,213); }
pre.hljl > span.hljl-np { }
pre.hljl > span.hljl-nl { }
pre.hljl > span.hljl-nn { }
pre.hljl > span.hljl-no { }
pre.hljl > span.hljl-nt { }
pre.hljl > span.hljl-nv { }
pre.hljl > span.hljl-nvc { }
pre.hljl > span.hljl-nvg { }
pre.hljl > span.hljl-nvi { }
pre.hljl > span.hljl-nvm { }
pre.hljl > span.hljl-l { }
pre.hljl > span.hljl-ld { color: rgb(148,91,176); font-style: italic; }
pre.hljl > span.hljl-s { color: rgb(201,61,57); }
pre.hljl > span.hljl-sa { color: rgb(201,61,57); }
pre.hljl > span.hljl-sb { color: rgb(201,61,57); }
pre.hljl > span.hljl-sc { color: rgb(201,61,57); }
pre.hljl > span.hljl-sd { color: rgb(201,61,57); }
pre.hljl > span.hljl-sdB { color: rgb(201,61,57); }
pre.hljl > span.hljl-sdC { color: rgb(201,61,57); }
pre.hljl > span.hljl-se { color: rgb(59,151,46); }
pre.hljl > span.hljl-sh { color: rgb(201,61,57); }
pre.hljl > span.hljl-si { }
pre.hljl > span.hljl-so { color: rgb(201,61,57); }
pre.hljl > span.hljl-sr { color: rgb(201,61,57); }
pre.hljl > span.hljl-ss { color: rgb(201,61,57); }
pre.hljl > span.hljl-ssB { color: rgb(201,61,57); }
pre.hljl > span.hljl-nB { color: rgb(59,151,46); }
pre.hljl > span.hljl-nbB { color: rgb(59,151,46); }
pre.hljl > span.hljl-nfB { color: rgb(59,151,46); }
pre.hljl > span.hljl-nh { color: rgb(59,151,46); }
pre.hljl > span.hljl-ni { color: rgb(59,151,46); }
pre.hljl > span.hljl-nil { color: rgb(59,151,46); }
pre.hljl > span.hljl-noB { color: rgb(59,151,46); }
pre.hljl > span.hljl-oB { color: rgb(102,102,102); font-weight: bold; }
pre.hljl > span.hljl-ow { color: rgb(102,102,102); font-weight: bold; }
pre.hljl > span.hljl-p { }
pre.hljl > span.hljl-c { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-ch { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-cm { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-cp { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-cpB { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-cs { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-csB { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-g { }
pre.hljl > span.hljl-gd { }
pre.hljl > span.hljl-ge { }
pre.hljl > span.hljl-geB { }
pre.hljl > span.hljl-gh { }
pre.hljl > span.hljl-gi { }
pre.hljl > span.hljl-go { }
pre.hljl > span.hljl-gp { }
pre.hljl > span.hljl-gs { }
pre.hljl > span.hljl-gsB { }
pre.hljl > span.hljl-gt { }
</style>



  <style type="text/css">
  @font-face {
  font-style: normal;
  font-weight: 300;
}
@font-face {
  font-style: normal;
  font-weight: 400;
}
@font-face {
  font-style: normal;
  font-weight: 600;
}
html {
  font-family: sans-serif; /* 1 */
  -ms-text-size-adjust: 100%; /* 2 */
  -webkit-text-size-adjust: 100%; /* 2 */
}
body {
  margin: 0;
}
article,
aside,
details,
figcaption,
figure,
footer,
header,
hgroup,
main,
menu,
nav,
section,
summary {
  display: block;
}
audio,
canvas,
progress,
video {
  display: inline-block; /* 1 */
  vertical-align: baseline; /* 2 */
}
audio:not([controls]) {
  display: none;
  height: 0;
}
[hidden],
template {
  display: none;
}
a:active,
a:hover {
  outline: 0;
}
abbr[title] {
  border-bottom: 1px dotted;
}
b,
strong {
  font-weight: bold;
}
dfn {
  font-style: italic;
}
h1 {
  font-size: 2em;
  margin: 0.67em 0;
}
mark {
  background: #ff0;
  color: #000;
}
small {
  font-size: 80%;
}
sub,
sup {
  font-size: 75%;
  line-height: 0;
  position: relative;
  vertical-align: baseline;
}
sup {
  top: -0.5em;
}
sub {
  bottom: -0.25em;
}
img {
  border: 0;
}
svg:not(:root) {
  overflow: hidden;
}
figure {
  margin: 1em 40px;
}
hr {
  -moz-box-sizing: content-box;
  box-sizing: content-box;
  height: 0;
}
pre {
  overflow: auto;
}
code,
kbd,
pre,
samp {
  font-family: monospace, monospace;
  font-size: 1em;
}
button,
input,
optgroup,
select,
textarea {
  color: inherit; /* 1 */
  font: inherit; /* 2 */
  margin: 0; /* 3 */
}
button {
  overflow: visible;
}
button,
select {
  text-transform: none;
}
button,
html input[type="button"], /* 1 */
input[type="reset"],
input[type="submit"] {
  -webkit-appearance: button; /* 2 */
  cursor: pointer; /* 3 */
}
button[disabled],
html input[disabled] {
  cursor: default;
}
button::-moz-focus-inner,
input::-moz-focus-inner {
  border: 0;
  padding: 0;
}
input {
  line-height: normal;
}
input[type="checkbox"],
input[type="radio"] {
  box-sizing: border-box; /* 1 */
  padding: 0; /* 2 */
}
input[type="number"]::-webkit-inner-spin-button,
input[type="number"]::-webkit-outer-spin-button {
  height: auto;
}
input[type="search"] {
  -webkit-appearance: textfield; /* 1 */
  -moz-box-sizing: content-box;
  -webkit-box-sizing: content-box; /* 2 */
  box-sizing: content-box;
}
input[type="search"]::-webkit-search-cancel-button,
input[type="search"]::-webkit-search-decoration {
  -webkit-appearance: none;
}
fieldset {
  border: 1px solid #c0c0c0;
  margin: 0 2px;
  padding: 0.35em 0.625em 0.75em;
}
legend {
  border: 0; /* 1 */
  padding: 0; /* 2 */
}
textarea {
  overflow: auto;
}
optgroup {
  font-weight: bold;
}
table {
  font-family: monospace, monospace;
  font-size : 0.8em;
  border-collapse: collapse;
  border-spacing: 0;
}
td,
th {
  padding: 0;
}
thead th {
    border-bottom: 1px solid black;
    background-color: white;
}
tr:nth-child(odd){
  background-color: rgb(248,248,248);
}


/*
* Skeleton V2.0.4
* Copyright 2014, Dave Gamache
* www.getskeleton.com
* Free to use under the MIT license.
* http://www.opensource.org/licenses/mit-license.php
* 12/29/2014
*/
.container {
  position: relative;
  width: 100%;
  max-width: 960px;
  margin: 0 auto;
  padding: 0 20px;
  box-sizing: border-box; }
.column,
.columns {
  width: 100%;
  float: left;
  box-sizing: border-box; }
@media (min-width: 400px) {
  .container {
    width: 85%;
    padding: 0; }
}
@media (min-width: 550px) {
  .container {
    width: 80%; }
  .column,
  .columns {
    margin-left: 4%; }
  .column:first-child,
  .columns:first-child {
    margin-left: 0; }

  .one.column,
  .one.columns                    { width: 4.66666666667%; }
  .two.columns                    { width: 13.3333333333%; }
  .three.columns                  { width: 22%;            }
  .four.columns                   { width: 30.6666666667%; }
  .five.columns                   { width: 39.3333333333%; }
  .six.columns                    { width: 48%;            }
  .seven.columns                  { width: 56.6666666667%; }
  .eight.columns                  { width: 65.3333333333%; }
  .nine.columns                   { width: 74.0%;          }
  .ten.columns                    { width: 82.6666666667%; }
  .eleven.columns                 { width: 91.3333333333%; }
  .twelve.columns                 { width: 100%; margin-left: 0; }

  .one-third.column               { width: 30.6666666667%; }
  .two-thirds.column              { width: 65.3333333333%; }

  .one-half.column                { width: 48%; }

  /* Offsets */
  .offset-by-one.column,
  .offset-by-one.columns          { margin-left: 8.66666666667%; }
  .offset-by-two.column,
  .offset-by-two.columns          { margin-left: 17.3333333333%; }
  .offset-by-three.column,
  .offset-by-three.columns        { margin-left: 26%;            }
  .offset-by-four.column,
  .offset-by-four.columns         { margin-left: 34.6666666667%; }
  .offset-by-five.column,
  .offset-by-five.columns         { margin-left: 43.3333333333%; }
  .offset-by-six.column,
  .offset-by-six.columns          { margin-left: 52%;            }
  .offset-by-seven.column,
  .offset-by-seven.columns        { margin-left: 60.6666666667%; }
  .offset-by-eight.column,
  .offset-by-eight.columns        { margin-left: 69.3333333333%; }
  .offset-by-nine.column,
  .offset-by-nine.columns         { margin-left: 78.0%;          }
  .offset-by-ten.column,
  .offset-by-ten.columns          { margin-left: 86.6666666667%; }
  .offset-by-eleven.column,
  .offset-by-eleven.columns       { margin-left: 95.3333333333%; }

  .offset-by-one-third.column,
  .offset-by-one-third.columns    { margin-left: 34.6666666667%; }
  .offset-by-two-thirds.column,
  .offset-by-two-thirds.columns   { margin-left: 69.3333333333%; }

  .offset-by-one-half.column,
  .offset-by-one-half.columns     { margin-left: 52%; }

}
html {
  font-size: 62.5%; }
body {
  font-size: 1.5em; /* currently ems cause chrome bug misinterpreting rems on body element */
  line-height: 1.6;
  font-weight: 400;
  font-family: "Raleway", "HelveticaNeue", "Helvetica Neue", Helvetica, Arial, sans-serif;
  color: #222; }
h1, h2, h3, h4, h5, h6 {
  margin-top: 0;
  margin-bottom: 2rem;
  font-weight: 300; }
h1 { font-size: 3.6rem; line-height: 1.2;  letter-spacing: -.1rem;}
h2 { font-size: 3.4rem; line-height: 1.25; letter-spacing: -.1rem; }
h3 { font-size: 3.2rem; line-height: 1.3;  letter-spacing: -.1rem; }
h4 { font-size: 2.8rem; line-height: 1.35; letter-spacing: -.08rem; }
h5 { font-size: 2.4rem; line-height: 1.5;  letter-spacing: -.05rem; }
h6 { font-size: 1.5rem; line-height: 1.6;  letter-spacing: 0; }

p {
  margin-top: 0; }
a {
  color: #1EAEDB; }
a:hover {
  color: #0FA0CE; }
.button,
button,
input[type="submit"],
input[type="reset"],
input[type="button"] {
  display: inline-block;
  height: 38px;
  padding: 0 30px;
  color: #555;
  text-align: center;
  font-size: 11px;
  font-weight: 600;
  line-height: 38px;
  letter-spacing: .1rem;
  text-transform: uppercase;
  text-decoration: none;
  white-space: nowrap;
  background-color: transparent;
  border-radius: 4px;
  border: 1px solid #bbb;
  cursor: pointer;
  box-sizing: border-box; }
.button:hover,
button:hover,
input[type="submit"]:hover,
input[type="reset"]:hover,
input[type="button"]:hover,
.button:focus,
button:focus,
input[type="submit"]:focus,
input[type="reset"]:focus,
input[type="button"]:focus {
  color: #333;
  border-color: #888;
  outline: 0; }
.button.button-primary,
button.button-primary,
input[type="submit"].button-primary,
input[type="reset"].button-primary,
input[type="button"].button-primary {
  color: #FFF;
  background-color: #33C3F0;
  border-color: #33C3F0; }
.button.button-primary:hover,
button.button-primary:hover,
input[type="submit"].button-primary:hover,
input[type="reset"].button-primary:hover,
input[type="button"].button-primary:hover,
.button.button-primary:focus,
button.button-primary:focus,
input[type="submit"].button-primary:focus,
input[type="reset"].button-primary:focus,
input[type="button"].button-primary:focus {
  color: #FFF;
  background-color: #1EAEDB;
  border-color: #1EAEDB; }
input[type="email"],
input[type="number"],
input[type="search"],
input[type="text"],
input[type="tel"],
input[type="url"],
input[type="password"],
textarea,
select {
  height: 38px;
  padding: 6px 10px; /* The 6px vertically centers text on FF, ignored by Webkit */
  background-color: #fff;
  border: 1px solid #D1D1D1;
  border-radius: 4px;
  box-shadow: none;
  box-sizing: border-box; }
/* Removes awkward default styles on some inputs for iOS */
input[type="email"],
input[type="number"],
input[type="search"],
input[type="text"],
input[type="tel"],
input[type="url"],
input[type="password"],
textarea {
  -webkit-appearance: none;
     -moz-appearance: none;
          appearance: none; }
textarea {
  min-height: 65px;
  padding-top: 6px;
  padding-bottom: 6px; }
input[type="email"]:focus,
input[type="number"]:focus,
input[type="search"]:focus,
input[type="text"]:focus,
input[type="tel"]:focus,
input[type="url"]:focus,
input[type="password"]:focus,
textarea:focus,
select:focus {
  border: 1px solid #33C3F0;
  outline: 0; }
label,
legend {
  display: block;
  margin-bottom: .5rem;
  font-weight: 600; }
fieldset {
  padding: 0;
  border-width: 0; }
input[type="checkbox"],
input[type="radio"] {
  display: inline; }
label > .label-body {
  display: inline-block;
  margin-left: .5rem;
  font-weight: normal; }
ul {
  list-style: circle; }
ol {
  list-style: decimal; }
ul ul,
ul ol,
ol ol,
ol ul {
  margin: 1.5rem 0 1.5rem 3rem;
  font-size: 90%; }
li > p {margin : 0;}
th,
td {
  padding: 12px 15px;
  text-align: left;
  border-bottom: 1px solid #E1E1E1; }
th:first-child,
td:first-child {
  padding-left: 0; }
th:last-child,
td:last-child {
  padding-right: 0; }
button,
.button {
  margin-bottom: 1rem; }
input,
textarea,
select,
fieldset {
  margin-bottom: 1.5rem; }
pre,
blockquote,
dl,
figure,
table,
p,
ul,
ol,
form {
  margin-bottom: 1.0rem; }
.u-full-width {
  width: 100%;
  box-sizing: border-box; }
.u-max-full-width {
  max-width: 100%;
  box-sizing: border-box; }
.u-pull-right {
  float: right; }
.u-pull-left {
  float: left; }
hr {
  margin-top: 3rem;
  margin-bottom: 3.5rem;
  border-width: 0;
  border-top: 1px solid #E1E1E1; }
.container:after,
.row:after,
.u-cf {
  content: "";
  display: table;
  clear: both; }

pre {
  display: block;
  padding: 9.5px;
  margin: 0 0 10px;
  font-size: 13px;
  line-height: 1.42857143;
  word-break: break-all;
  word-wrap: break-word;
  border: 1px solid #ccc;
  border-radius: 4px;
}

pre.hljl {
  margin: 0 0 10px;
  display: block;
  background: #f5f5f5;
  border-radius: 4px;
  padding : 5px;
}

pre.output {
  background: #ffffff;
}

pre.code {
  background: #ffffff;
}

pre.julia-error {
  color : red
}

code,
kbd,
pre,
samp {
  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
  font-size: 0.9em;
}


@media (min-width: 400px) {}
@media (min-width: 550px) {}
@media (min-width: 750px) {}
@media (min-width: 1000px) {}
@media (min-width: 1200px) {}

h1.title {margin-top : 20px}
img {max-width : 100%}
div.title {text-align: center;}

  </style>
</HEAD>

<BODY>
  <div class ="container">
    <div class = "row">
      <div class = "col-md-12 twelve columns">
        <div class="title">
          <h1 class="title">PDEs, Convolutions, and the Mathematics of Locality</h1>
          <h5>Chris Rackauckas</h5>
          <h5>December 4th, 2020</h5>
        </div>

        <p>At this point we have identified how the worlds of machine learning and scientific computing collide by looking at the parameter estimation problem. Training neural networks is parameter estimation of a function <code>f</code> where <code>f</code> is a neural network. Backpropogation of a neural network is simply the adjoint problem for <code>f</code>, and it falls under the class of methods used in reverse-mode automatic differentiation. But this story also extends to structure. Recurrent neural networks are the Euler discretization of a continuous recurrent neural network, also known as a neural ordinary differential equation.</p>
<p>Given all of these relations, our next focus will be on the other class of commonly used neural networks: the <em>convolutional neural network</em> &#40;CNN&#41;. It turns out that in this case there is also a clear analogue to convolutional neural networks in traditional scientific computing, and this is seen in discretizations of partial differential equations. To see this, we will first describe the convolution operation that is central to the CNN and see how this object naturally arises in numerical partial differential equations.</p>
<h2>Convolutional Neural Networks</h2>
<p>The purpose of a convolutional neural network is to be a network which makes use of the spatial structure of an image. An image is a 3-dimensional object: width, height, and 3 color channels. The convolutional operations keeps this structure intact and acts against this object is a 3-tensor. A convolutional layer is a function that applies a <em>stencil</em> to each point. This is illustrated by the following animation:</p>
<p><img src="https://miro.medium.com/max/526/1*GcI7G-JLAQiEoCON7xFbhg.gif" alt="convolution" /></p>
<p>This is the 2D stencil:</p>
<pre><code>1  0  1
0  1  0
1  0  1</code></pre>
<p>which is then applied to the matrix at each inner point to go from an NxNx3 matrix to an &#40;N-2&#41;x&#40;N-2&#41;x3 matrix.</p>
<p>Another operation used with convolutions is the <em>pooling layer</em>. For example, the <em>maxpool</em> layer is stencil which takes the maximum of the the value and its neighbor, and the <em>meanpool</em> takes the mean over the nearby values, i.e. it is equivalent to the stencil:</p>
<pre><code>1/9 1/9 1/9
1/9 1/9 1/9
1/9 1/9 1/9</code></pre>
<p>A convolutional neural network is then composed of layers of this form. We can express this mathematically by letting <span class="math">$conv(x;S)$</span> as the convolution of <span class="math">$x$</span> given a stencil <span class="math">$S$</span>. If we let <span class="math">$dense(x;W,b,σ) = σ(W*x + b)$</span> as a layer from a standard neural network, then deep convolutional neural networks are of forms like:</p>
<p class="math">\[
CNN(x) = dense(conv(maxpool(conv(x))))
\]</p>
<p>which can be expressed in Flux.jl syntax as:</p>


<pre class='hljl'>
<span class='hljl-n'>m</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>Chain</span><span class='hljl-p'>(</span><span class='hljl-t'>
  </span><span class='hljl-nf'>Conv</span><span class='hljl-p'>((</span><span class='hljl-ni'>2</span><span class='hljl-p'>,</span><span class='hljl-ni'>2</span><span class='hljl-p'>),</span><span class='hljl-t'> </span><span class='hljl-ni'>1</span><span class='hljl-oB'>=&gt;</span><span class='hljl-ni'>16</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>relu</span><span class='hljl-p'>),</span><span class='hljl-t'>
  </span><span class='hljl-n'>x</span><span class='hljl-t'> </span><span class='hljl-oB'>-&gt;</span><span class='hljl-t'> </span><span class='hljl-nf'>maxpool</span><span class='hljl-p'>(</span><span class='hljl-n'>x</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-p'>(</span><span class='hljl-ni'>2</span><span class='hljl-p'>,</span><span class='hljl-ni'>2</span><span class='hljl-p'>)),</span><span class='hljl-t'>
  </span><span class='hljl-nf'>Conv</span><span class='hljl-p'>((</span><span class='hljl-ni'>2</span><span class='hljl-p'>,</span><span class='hljl-ni'>2</span><span class='hljl-p'>),</span><span class='hljl-t'> </span><span class='hljl-ni'>16</span><span class='hljl-oB'>=&gt;</span><span class='hljl-ni'>8</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>relu</span><span class='hljl-p'>),</span><span class='hljl-t'>
  </span><span class='hljl-n'>x</span><span class='hljl-t'> </span><span class='hljl-oB'>-&gt;</span><span class='hljl-t'> </span><span class='hljl-nf'>maxpool</span><span class='hljl-p'>(</span><span class='hljl-n'>x</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-p'>(</span><span class='hljl-ni'>2</span><span class='hljl-p'>,</span><span class='hljl-ni'>2</span><span class='hljl-p'>)),</span><span class='hljl-t'>
  </span><span class='hljl-n'>x</span><span class='hljl-t'> </span><span class='hljl-oB'>-&gt;</span><span class='hljl-t'> </span><span class='hljl-nf'>reshape</span><span class='hljl-p'>(</span><span class='hljl-n'>x</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-oB'>:</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-nf'>size</span><span class='hljl-p'>(</span><span class='hljl-n'>x</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-ni'>4</span><span class='hljl-p'>)),</span><span class='hljl-t'>
  </span><span class='hljl-nf'>Dense</span><span class='hljl-p'>(</span><span class='hljl-ni'>288</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-ni'>10</span><span class='hljl-p'>),</span><span class='hljl-t'> </span><span class='hljl-n'>softmax</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>|&gt;</span><span class='hljl-t'> </span><span class='hljl-n'>gpu</span>
</pre>


<h2>Discretizations of Partial Differential Equations</h2>
<p>Now let&#39;s investigate discertizations of partial differential equations. A canonical differential equation to start with is the Poisson equation. This is the equation:</p>
<p class="math">\[
u_{xx} = f(x)
\]</p>
<p>where here we have that subscripts correspond to partial derivatives, i.e. this syntax stands for the partial differential equation:</p>
<p class="math">\[
\frac{d^2u}{dx^2} = f(x)
\]</p>
<p>In this case, <span class="math">$f$</span> is some given data and the goal is to find the <span class="math">$u$</span> that satisfies this equation. There are two ways this is generally done:</p>
<ol>
<li><p>Expand out the derivative in terms of Taylor series approximations.</p>
</li>
<li><p>Expand out <span class="math">$u$</span> in terms of some function basis.</p>
</li>
</ol>
<h3>Finite Difference Discretizations</h3>
<p>Let&#39;s start by looking at Taylor series approximations to the derivative. In this case, we will use what&#39;s known as finite differences. The simplest finite difference approximation is known as the first order forward difference. This is commonly denoted as</p>
<p class="math">\[
\delta_{+}u=\frac{u(x+\Delta x)-u(x)}{\Delta x}
\]</p>
<p>This looks like a derivative, and we think it&#39;s a derivative as <span class="math">$\Delta x\rightarrow 0$</span>, but let&#39;s show that this approximation is meaningful. Assume that <span class="math">$u$</span> is sufficiently nice. Then from a Taylor series we have that</p>
<p class="math">\[
u(x+\Delta x)=u(x)+\Delta xu^{\prime}(x)+\mathcal{O}(\Delta x^{2})
\]</p>
<p>&#40;here I write <span class="math">$\left(\Delta x\right)^{2}$</span> as <span class="math">$\Delta x^{2}$</span> out of convenience, note that those two terms are not necessarily the same&#41;. That term on the end is called “Big-O Notation”. What is means is that those terms are asymtopically like <span class="math">$\Delta x^{2}$</span>. If <span class="math">$\Delta x$</span> is small, then <span class="math">$\Delta x^{2}\ll\Delta x$</span> and so we can think of those terms as smaller than any of the terms we show in the expansion. By simplification notice that we get</p>
<p class="math">\[
\frac{u(x+\Delta x)-u(x)}{\Delta x}=u^{\prime}(x)+\mathcal{O}(\Delta x)
\]</p>
<p>This means that <span class="math">$\delta_{+}$</span> is correct up to first order, where the <span class="math">$\mathcal{O}(\Delta x)$</span> portion that we dropped is the error. Thus <span class="math">$\delta_{+}$</span> is a first order approximation.</p>
<p>Notice that the same proof shows that the backwards difference,</p>
<p class="math">\[
\delta_{-}u=\frac{u(x)-u(x-\Delta x)}{\Delta x}
\]</p>
<p>is first order.</p>
<h4>Second Order Approximations to the First Derivative</h4>
<p>Now let&#39;s look at the following:</p>
<p class="math">\[
\delta_{0}u=\frac{u(x+\Delta x)-u(x-\Delta x)}{2\Delta x}.
\]</p>
<p>The claim is this differencing scheme is second order. To show this, we once again turn to Taylor Series. Let&#39;s do this for both terms:</p>
<p class="math">\[
u(x+\Delta x)	=u(x)+\Delta xu^{\prime}(x)+\frac{\Delta x^{2}}{2}u^{\prime\prime}(x)+\mathcal{O}(\Delta x^{3})
\]</p>
<p class="math">\[
u(x-\Delta x)	=u(x)-\Delta xu^{\prime}(x)+\frac{\Delta x^{2}}{2}u^{\prime\prime}(x)+\mathcal{O}(\Delta x^{3})
\]</p>
<p>Now we subtract the two:</p>
<p class="math">\[
u(x+\Delta x)-u(x-\Delta x)=2\Delta xu^{\prime}(x)+\mathcal{O}(\Delta x^{3})
\]</p>
<p>and thus we move tems around to get</p>
<p class="math">\[
\delta_{0}u=\frac{u(x+\Delta x)-u(x-\Delta x)}{2\Delta x}=u^{\prime}(x)+\mathcal{O}\left(\Delta x^{2}\right)
\]</p>
<p>What does this improvement mean? Let&#39;s say we go from <span class="math">$\Delta x$</span> to <span class="math">$\frac{\Delta x}{2}$</span>. Then while the error from the first order method is around <span class="math">$\frac{1}{2}$</span> the original error, the error from the central differencing method is <span class="math">$\frac{1}{4}$</span> the original error&#33; When trying to get an accurate solution, this quadratic reduction can make quite a difference in the number of required points.</p>
<h4>Second Derivative Central Difference</h4>
<p>Now we want a second derivative approximation. Let&#39;s show the classic central difference formula for the second derivative:</p>
<p class="math">\[
\delta_{0}^{2}u=\frac{u(x+\Delta x)-2u(x)+u(x-\Delta x)}{\Delta x^{2}}
\]</p>
<p>is second order. To do so, we expand out the two terms:</p>
<p class="math">\[
u(x+\Delta x)	=u(x)+\Delta xu^{\prime}(x)+\frac{\Delta x^{2}}{2}u^{\prime\prime}(x)+\frac{\Delta x^{3}}{6}u^{\prime\prime\prime}(x)+\mathcal{O}\left(\Delta x^{4}\right)
\]</p>
<p class="math">\[
u(x-\Delta x)	=u(x)-\Delta xu^{\prime}(x)+\frac{\Delta x^{2}}{2}u^{\prime\prime}(x)-\frac{\Delta x^{3}}{6}u^{\prime\prime\prime}(x)+\mathcal{O}\left(\Delta x^{4}\right)
\]</p>
<p>and now plug it in. It&#39;s clear the <span class="math">$u(x)$</span> cancels out. The opposite signs makes <span class="math">$u^{\prime}(x)$</span> cancel out, and then the same signs and cancellation makes the <span class="math">$u^{\prime\prime}$</span> term have a coefficient of 1. But, the opposite signs makes the <span class="math">$u^{\prime\prime\prime}$</span> term cancel out. Thus when we simplify and divide by <span class="math">$\Delta x^{2}$</span> we get</p>
<p class="math">\[
\frac{u(x+\Delta x)-2u(x)+u(x-\Delta x)}{\Delta x^{2}}=u^{\prime\prime}(x)+\mathcal{O}\left(\Delta x^{2}\right).
\]</p>
<h4>Finite Differencing from Polynomial Interpolation</h4>
<p>Finite differencing can also be derived from polynomial interpolation. Draw a line between two points. What is the approximation for the first derivative?</p>
<p class="math">\[
\delta_{+}u=\frac{u(x+\Delta x)-u(x)}{\Delta x}
\]</p>
<p>Now draw a quadratic through three points. i.e., given <span class="math">$u_{1}$</span>, <span class="math">$u_{2}$</span>, and <span class="math">$u_{3}$</span> at <span class="math">$x=0$</span>, <span class="math">$\Delta x$</span>, <span class="math">$2\Delta x$</span>, we want to find the interpolating polynomial</p>
<p class="math">\[
g(x)=a_{1}x^{2}+a_{2}x+a_{3}
\]</p>
<p>.</p>
<p>Setting <span class="math">$g(0)=u_{1}$</span>, <span class="math">$g(\Delta x)=u_{2}$</span>, and <span class="math">$g(2\Delta x)=u_{3}$</span>, we get the following relations:</p>
<p class="math">\[
u_{1}	=g(0)=a_{3}
\]</p>
<p class="math">\[
u_{2}	=g(\Delta x)=a_{1}\Delta x^{2}+a_{2}\Delta x+a_{3}
\]</p>
<p class="math">\[
u_{3}	=g(2\Delta x)=4a_{1}\Delta x^{2}+2a_{2}\Delta x+a_{3}
\]</p>
<p>which when we write in matrix form is:</p>
<p class="math">\[
\left(\begin{array}{ccc}
0 & 0 & 1\\
\Delta x^{2} & \Delta x & 1\\
4\Delta x^{2} & 2\Delta x & 1
\end{array}\right)\left(\begin{array}{c}
a_{1}\\
a_{2}\\
a_{3}
\end{array}\right)=\left(\begin{array}{c}
u_{1}\\
u_{2}\\
u_{3}
\end{array}\right)
\]</p>
<p>and thus we can invert the matrix to get the a&#39;s:</p>
<p class="math">\[
a_{1}	=\frac{u_{3}-2u_{2}+u_{1}}{2\Delta x^{2}}
\]</p>
<p class="math">\[
a_{2}	=\frac{-u_{3}+4u_{2}-3u_{1}}{2\Delta x}
\]</p>
<p class="math">\[
a_{3}	=u_{1}\text{ or }g(x)=\frac{u_{3}-2u_{2}-u_{1}}{2\Delta x^{2}}x^{2}+\frac{-u_{3}+4u_{2}-3u_{1}}{2\Delta x}x+u_{1}
\]</p>
<p>Now we can get derivative approximations from this. Notice for example that</p>
<p class="math">\[
g^{\prime}(x)=\frac{u_{3}-2u_{2}+u_{1}}{\Delta x^{2}}x+\frac{-u_{3}+4u_{2}-3u_{1}}{2\Delta x}
\]</p>
<p>Now what&#39;s the derivative at the middle point?</p>
<p class="math">\[
g^{\prime}\left(\Delta x\right)=\frac{u_{3}-2u_{2}+u_{1}}{\Delta x}+\frac{-u_{3}+4u_{2}-3u_{1}}{2\Delta x}=\frac{u_{3}-u_{1}}{2\Delta x}.
\]</p>
<p>And now check</p>
<p class="math">\[
g^{\prime\prime}(\Delta x)=\frac{u_{3}-2u_{2}+u_{1}}{\Delta x^{2}}
\]</p>
<p>which is the central derivative formula. This gives a systematic way of deriving higher order finite differencing formulas. In fact, this formulation allows one to derive finite difference formulae for non-evenly spaced grids as well&#33; The algorithm which automatically generates stencils from the interpolating polynomial forms is the Fornberg algorithm.</p>
<h4>Multidimensional Finite Difference Operations</h4>
<p>Now let&#39;s look at the multidimensional Poisson equation, commonly written as:</p>
<p class="math">\[
\Delta u = f(x,y)
\]</p>
<p>where <span class="math">$\Delta u = u_{xx} + u_{yy}$</span>. Using the logic of the previous sections, we can approximate the two derivatives to have:</p>
<p class="math">\[
\frac{u(x+\Delta x,y)-2u(x,y)+u(x-\Delta x,y)}{\Delta x^{2}} + \frac{u(x,y+\Delta y)-2u(x,y)+u(x-x,y-\Delta y)}{\Delta y^{2}}=u^{\prime\prime}(x)+\mathcal{O}\left(\Delta x^{2}\right) + \mathcal{O}\left(\Delta y^{2}\right).
\]</p>
<p>Notice that this is the stencil operation:</p>
<pre><code>0  1 0
1 -4 1
0  1 0</code></pre>
<p>This means that <strong>derivative discretizations are stencil or convolutional operations</strong>.</p>
<h2>Representation and Implementation of Stencil Operations</h2>
<h3>Stencil Operations as Sparse Matrices</h3>
<p>Stencil operations are linear operators, i.e. <span class="math">$S[x+\alpha y] = S[x] + \alpha S[y]$</span> for any sufficiently nice stencil operation <span class="math">$S$</span> &#40;note &quot;sufficiently nice&quot;: there is a &quot;stencil&quot; operation mentioned in the convolutional neural networks section which was not linear: which operation was it?&#41;. Now we write these operators as matrices. Notice that for the vector:</p>
<p class="math">\[
U=\left(\begin{array}{c}
u_{1}\\
\vdots\\
u_{n}
\end{array}\right),
\]</p>
<p>we have that</p>
<p class="math">\[
\delta_{+}U=\left(\begin{array}{c}
u_{2}-u_{1}\\
\vdots\\
u_{n}-u_{n-1}
\end{array}\right)
\]</p>
<p>and so</p>
<p class="math">\[
\delta_{+}=\left(\begin{array}{ccccc}
-1 & 1\\
 & -1 & 1\\
 &  & \ddots & \ddots\\
 &  &  & -1 & 1
\end{array}\right)
\]</p>
<p>We can do the same to understand the other operators. But notice something: this operator isn&#39;t square&#33; In order for this to be square, in order to know what happens at the endpoint, we need to know the boundary conditions. I.e., an assumption on the value or derivative at <span class="math">$u(0)$</span> or <span class="math">$u(1)$</span> is required in order to get the first/last rows of the matrix&#33;</p>
<p>Similarly, <span class="math">$\delta_{0}^{2}$</span> can be represented as the tridiagonal matrix of <code>&#91;1 -2 1&#93;</code>, also known as the Strang matrix.</p>
<p>Now let&#39;s think about the higher dimensional forms as a vector, i.e. <code>vec&#40;u&#41;</code>. In this case, what is the matrix <code>A</code> for which <code>reshape&#40;A*vec&#40;u&#41;,size&#40;u&#41;...&#41;</code> performs the higher dimensional Laplacian, i.e. <span class="math">$u_{xx} + u_{yy}$</span>? The answer is that it discretizes via Kronecker products to:</p>
<p class="math">\[
A=I_{y}\otimes A_{x}+A_{y}\otimes I_{x}
\]</p>
<p>or:</p>
<p class="math">\[
\frac{\partial^{2}}{\partial x^{2}}=\left(\begin{array}{cccc}
A_{x}\\
 & A_{x}\\
 &  & \ddots\\
 &  &  & A_{x}
\end{array}\right)
\]</p>
<p>and</p>
<p class="math">\[
\frac{\partial^{2}}{\partial y^{2}}=\left(\begin{array}{cccc}
-2I_{x} & I_{x}\\
I_{x} & -2I_{x} & I_{x}\\
 &  & \ddots\\
\\
\end{array}\right)
\]</p>
<p>To see why this is the case, understand it again as the stencil operation</p>
<pre><code>0  1 0
1 -4 1
0  1 0</code></pre>
<p>In this operation, at a point you still use the up and down neighbors, and thus this has a tridiagonal form since those are the immediate neighbors, but the next <span class="math">$y$</span> value is <span class="math">$N$</span> over, so this is where the block tridiagonal form comes for the stencil in the <span class="math">$y$</span> terms. When these are added together one receives the appropriate matrix. The Kronecker product effectively encodes this &quot;N over&quot; behavior. It also readily generalizes to <span class="math">$N$</span> dimensions. To see this for 3-dimensional Laplacians, <span class="math">$u_{xx} + u_{yy} + u_{zz}$</span>, notice that</p>
<p class="math">\[
A=I_z \otimes I_{y}\otimes A_{x} + I_z \otimes A_{y}\otimes I_{x} + A_z \otimes I_y \otimes I_x
\]</p>
<p>using the same reasoning about &quot;N&quot; over and &quot;N^2 over&quot;, and from this formulation it&#39;s clear how to generalize to arbitrary dimensions.</p>
<p>We note that there is an alternative representation as well for 2D forms. We can represent them with left and right matrix operations. When <span class="math">$u$</span> is represented as a matrix, notice that</p>
<p class="math">\[
A(u) = A_y u + u A_x
\]</p>
<p>where <span class="math">$A_y$</span> and <span class="math">$A_x$</span> are both the <code>&#91;1 -2 1&#93;</code> 1D tridiagonal stencil matrix, but by right multiplying it&#39;s occuring along the columns and left multiplying occurs along the rows. This then gives a semi-dense formulation of the stencil operation.</p>
<h3>Implementation via Stencil Compilers</h3>
<p>Sparse matrix implementations of stencils are fairly inefficient given the way that sparse matrices are represented &#40;lists of &#40;i,j,v&#41; pairs, which are then compressed into CSR or CSC formats&#41;. However, it moves in the right direction by noticing that the operation</p>
<pre><code>u&#91;i&#43;1,j&#93; &#43; u&#91;i,j&#43;1&#93; &#43; u&#91;i-1,j&#93; &#43; u&#91;i,j-1&#93; - 4u&#91;i,j&#93;</code></pre>
<p>is an inefficient way to walk through the data. The reason is because <code>u&#91;i,j&#43;1&#93;</code> is using values that are far away from <code>u&#91;i,j&#93;</code>, and thus they may not necessarily be in the cache.</p>
<p>Thus what is generally used is a <em>stencil compiler</em> which generates functions for stencil operations. These work by dividing the tensor into blocks on which the stencil is applied, where the blocks are small enough to allow the cache lines to fit the future points. This is a very deep computational topic that is beyond the scope of this course. Note one of the main reasons why NVIDA&#39;s CUDA dominates machine learning is because of its <code>cudnn</code> library, which is a very efficient GPU stencil computation library that is specifically tuned to NVIDIA&#39;s GPUs.</p>
<h2>Cross-Discipline Learning</h2>
<p>Given these relations, there is a lot each of the disciplines can learn from one another about stencil computations.</p>
<h3>What ML can learning from SciComp: Stability of Convolutional RNNs</h3>
<p>Stability of time-dependent partial differential equations is a long-known problem. Stability of an RNN defined by stencil computations is then stability of Euler discretizations of PDEs. Let&#39;s take a look at Von Neumann analysis of PDE stability.</p>
<p>Let&#39;s look at the error update equation. Write</p>
<p class="math">\[
e_{i}^{n}=u(x_{j},t_{n})-u_{j}^{n}
\]</p>
<p>For <span class="math">$e_{i}^{n}$</span>, as before, plug it in, add and subtract <span class="math">$u(x_{j},t^{n})=u_{j}^{n}$</span>, and then we get</p>
<p class="math">\[
e_{i}^{n+1}=e_{i}^{n}+\mu\left(e_{i+1}^{n}-2e_{i}^{n}+e_{i-1}^{n}\right)+\Delta t\tau_{i}^{n}
\]</p>
<p>where</p>
<p class="math">\[
\tau_{i}^{n}\sim\mathcal{O}(\Delta t)+\mathcal{O}(\Delta x^{2}).
\]</p>
<p>Stability requires that the homogenous equation goes to zero. Another way of saying that is that the propogation of errors has errors decrease their influence over time. Thus we look at:</p>
<p class="math">\[
e_{i}^{n+1}	=e_{i}^{n}+\mu\left(e_{i+1}^{n}-2e_{i}^{n}+e_{i-1}^{n}\right) =\left(1-2\mu\right)e_{i}^{n}+\mu e_{i+1}^{n}+\mu e_{i-1}^{n}
\]</p>
<p>A necessary condition for decreasing is then for all coefficients to be positive</p>
<p class="math">\[
1-2\mu\geq0
\]</p>
<p>or</p>
<p class="math">\[
\mu\leq\frac{1}{2}
\]</p>
<p>A more satisfying way may be to look at the generated ODE</p>
<p class="math">\[
u^{\prime}=Au
\]</p>
<p>where A is the matrix <span class="math">$\left[\mu,1-2\mu,\mu\right].$</span></p>
<p>But  finding the maximum eigenvalue is non-trivial. But for linear PDEs, one nice way to analyze the stability directly is to use the Fourier mode decomposition. This is known as Van Neumann stability analysis. To do this, decompose <span class="math">$U$</span> into the Fourier modes:</p>
<p class="math">\[
U(x,t)=\sum_{k}\hat{U}(t)e^{ikx}
\]</p>
<p>Since</p>
<p class="math">\[
x_{j}=j\Delta x,
\]</p>
<p>we can write this out as</p>
<p class="math">\[
U_{j}^{n}=\hat{U}^{n}e^{ikj\Delta x}
\]</p>
<p>and then plugging this into the FTCS scheme we get</p>
<p class="math">\[
\frac{\hat{U}^{n+1}e^{ikj\Delta x}-\hat{U}^{n}e^{ikj\Delta x}}{\Delta t}=\frac{\hat{U}^{n}e^{ik(j+1)\Delta x}-2\hat{U}^{n}e^{ikj\Delta x}+\hat{U}^{n}e^{ik(j-1)\Delta x}}{\Delta x^{2}}
\]</p>
<p>Let G be the growth factor, defined as</p>
<p class="math">\[
G=\frac{\hat{U}^{n+1}}{\hat{U}^{n}}
\]</p>
<p>and thus after cancelling we get</p>
<p class="math">\[
\frac{G-1}{\Delta t}=\frac{e^{ik\Delta x}-2+e^{-ik\Delta x}}{\Delta x^{2}}
\]</p>
<p>Since</p>
<p class="math">\[
e^{ik\Delta x}+e^{-ik\Delta x}=2\cos\left(k\Delta x\right),
\]</p>
<p>then we get</p>
<p class="math">\[
G=1-\mu\left(2\cos\left(k\Delta x\right)-2\right)
\]</p>
<p>and using the half angle formula</p>
<p class="math">\[
G=1-4\mu\sin^{2}\left(\frac{k\Delta x}{2}\right)
\]</p>
<p>In order to be stable, we require</p>
<p class="math">\[
\left|G\right|\leq1,
\]</p>
<p>which means</p>
<p class="math">\[
-1\leq1-4\mu\sin^{2}\left(\frac{k\Delta x}{2}\right)\leq1 \mu>0
\]</p>
<p>and so <span class="math">$\leq1$</span> is simple. Since <span class="math">$\sin^{2}(x)\leq1$</span>, then we can simplify this to</p>
<p class="math">\[
-1\leq1-4\mu
\]</p>
<p>and thus <span class="math">$\mu\leq\frac{1}{2}$</span>. With backwards Euler we get</p>
<p class="math">\[
\frac{G-1}{\Delta t}=\frac{G}{\Delta x^{2}}\left(e^{ik\Delta x}-2+e^{-ik\Delta x}\right)
\]</p>
<p>and thus get</p>
<p class="math">\[
G+4G\mu\sin^{2}\left(\frac{k\Delta x}{2}\right)=1
\]</p>
<p>and thus</p>
<p class="math">\[
G=\frac{1}{1+4\mu\sin^{2}\left(\frac{k\Delta x}{2}\right)}\leq1.
\]</p>
<h3>What SciComp can learn from ML: Moderate Generalizations to Partial Differential Equation Models</h3>
<p>Instead of using</p>
<p class="math">\[
\Delta u = f
\]</p>
<p>we can start with</p>
<p class="math">\[
S[u] = f
\]</p>
<p>a stencil computation, predefined to match a known partial differential equation operator, and then <em>transfer learn</em> the stencil to better match data. This is an approach which is starting to move down the lines of <em>physics-informed machine learning</em> that will be further explored in future lectures.</p>


        <HR/>
        <div class="footer">
          <p>
            Published from <a href="pdes_and_convolutions.jmd">pdes_and_convolutions.jmd</a>
            using <a href="http://github.com/JunoLab/Weave.jl">Weave.jl</a> v0.10.6 on 2020-12-04.
          </p>
        </div>
      </div>
    </div>
  </div>
</BODY>

</HTML>
